feat(gen): batch LLM annotation with --annotation-batch flag#70
Merged
Conversation
Add configurable batch annotation to reduce LLM round-trips during the semantic annotation pre-pass. Previously: 36 operations × ~15s per call = ~9 minutes sequential. With --annotation-batch=10: 4 batches × ~20s per call ≈ ~1–2 minutes. Design: - --annotation-batch N (default 0 = sequential, one call per op) - Each batch sends N operations in one prompt; LLM returns a JSON array keyed by operation_id (not array index) for reliable matching - Partial failure: if a batch returns fewer entries than requested, ops without a matching operation_id get no annotation — generation continues - Batch JSON error: ops in that batch get no annotation, warn logged - Engine.SetAnnotationBatch(n) for programmatic control Reliability: - operation_id-keyed responses survive LLM reordering or omissions - parseBatchAnnotations silently drops entries missing operation_id - Invalid JSON returns nil map → all ops in batch get no annotation - No cascading retry per op (annotation is best-effort by design) Tests: - TestEngine_BatchAnnotation_EmitsOneEventPerOp: TUI progress unaffected - TestEngine_BatchAnnotation_AnnotatesOpsCorrectly: key-based matching - TestEngine_BatchAnnotation_BatchFailureIsGraceful: invalid JSON safe - TestEngine_BatchAnnotation_SplitsIntoBatches: 5 ops / batch=3 → 2 calls - TestParseBatchAnnotations_*: unit tests for the parser - AT-252: --annotation-batch flag registered in gen --help
- Fix off-by-one: annotationBatch >= 1 now dispatches batch path (previously batch=1 fell through to sequential mode silently) - Add 200ms inter-batch throttle to reduce rate-limit pressure - Cap MaxTokens at min(256*n, 8192) — stays within provider output limits - Include op.Description in batch prompt alongside Summary for richer LLM signal - Remove unused n variable from batchLLMProvider.Complete in tests - Strengthen AT-252: also verify --annotation-batch flag runs gen to completion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--annotation-batch Nflag tocaseforge gen0preserves existing sequential behavior (no breaking change)How it works
Each batch sends N operations in one prompt. The LLM returns a JSON array keyed by
operation_id(not array index), so reordering or missing entries are handled safely without index alignment bugs.Failure modes (all graceful):
operation_idget no annotation, generation continuesTest plan
go test ./internal/methodology/ -run "TestEngine_Batch|TestParseBatch"— 7 new tests pass./scripts/acceptance.sh— AT-252 passes, pre-existing failures unchangedcaseforge gen --spec examples/speculo-api.yaml --annotation-batch 10— completes in ~1–2 min vs ~9 min